Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval
Identifieur interne : 000D23 ( Main/Exploration ); précédent : 000D22; suivant : 000D24Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval
Auteurs : SHIJIAN LU [Singapour] ; LINLIN LI [Singapour] ; CHEW LIM TAN [Singapour]Source :
- IEEE transactions on pattern analysis and machine intelligence [ 0162-8828 ] ; 2008.
Descripteurs français
- Pascal (Inist)
- Intelligence artificielle, Analyse forme, Recherche documentaire, Recherche image, Recherche information, Reconnaissance image, Image optique, Reconnaissance optique caractère, Reconnaissance caractère, Traitement image, Topologie, Interrogation base donnée, Analyse documentaire, Analyse image, Mesure forme, Annotation, Mot clé, Type document.
- Wicri :
English descriptors
- KwdEn :
- Annotation, Artificial intelligence, Character recognition, Database query, Document analysis, Document retrieval, Document types, Image analysis, Image processing, Image recognition, Image retrieval, Information retrieval, Keyword, Optical character recognition, Optical image, Pattern analysis, Shape measurement, Topology.
Abstract
This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000242
- to stream PascalFrancis, to step Curation: 000537
- to stream PascalFrancis, to step Checkpoint: 000245
- to stream Main, to step Merge: 000D35
- to stream Main, to step Curation: 000D23
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval</title>
<author><name sortKey="Shijian Lu" sort="Shijian Lu" uniqKey="Shijian Lu" last="Shijian Lu">SHIJIAN LU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 21 Heng Mui Keng Terrace</s1>
<s2>Singapore, 119613</s2>
<s3>SGP</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore, 119613</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Linlin Li" sort="Linlin Li" uniqKey="Linlin Li" last="Linlin Li">LINLIN LI</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author><name sortKey="Chew Lim Tan" sort="Chew Lim Tan" uniqKey="Chew Lim Tan" last="Chew Lim Tan">CHEW LIM TAN</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">09-0012148</idno>
<date when="2008">2008</date>
<idno type="stanalyst">PASCAL 09-0012148 INIST</idno>
<idno type="RBID">Pascal:09-0012148</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000242</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000537</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000245</idno>
<idno type="wicri:doubleKey">0162-8828:2008:Shijian Lu:document:image:retrieval</idno>
<idno type="wicri:Area/Main/Merge">000D35</idno>
<idno type="wicri:Area/Main/Curation">000D23</idno>
<idno type="wicri:Area/Main/Exploration">000D23</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval</title>
<author><name sortKey="Shijian Lu" sort="Shijian Lu" uniqKey="Shijian Lu" last="Shijian Lu">SHIJIAN LU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 21 Heng Mui Keng Terrace</s1>
<s2>Singapore, 119613</s2>
<s3>SGP</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore, 119613</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Linlin Li" sort="Linlin Li" uniqKey="Linlin Li" last="Linlin Li">LINLIN LI</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author><name sortKey="Chew Lim Tan" sort="Chew Lim Tan" uniqKey="Chew Lim Tan" last="Chew Lim Tan">CHEW LIM TAN</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
<imprint><date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Annotation</term>
<term>Artificial intelligence</term>
<term>Character recognition</term>
<term>Database query</term>
<term>Document analysis</term>
<term>Document retrieval</term>
<term>Document types</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Image retrieval</term>
<term>Information retrieval</term>
<term>Keyword</term>
<term>Optical character recognition</term>
<term>Optical image</term>
<term>Pattern analysis</term>
<term>Shape measurement</term>
<term>Topology</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Intelligence artificielle</term>
<term>Analyse forme</term>
<term>Recherche documentaire</term>
<term>Recherche image</term>
<term>Recherche information</term>
<term>Reconnaissance image</term>
<term>Image optique</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Traitement image</term>
<term>Topologie</term>
<term>Interrogation base donnée</term>
<term>Analyse documentaire</term>
<term>Analyse image</term>
<term>Mesure forme</term>
<term>Annotation</term>
<term>Mot clé</term>
<term>Type document</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Intelligence artificielle</term>
<term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.</div>
</front>
</TEI>
<affiliations><list><country><li>Singapour</li>
</country>
<orgName><li>Université nationale de Singapour</li>
</orgName>
</list>
<tree><country name="Singapour"><noRegion><name sortKey="Shijian Lu" sort="Shijian Lu" uniqKey="Shijian Lu" last="Shijian Lu">SHIJIAN LU</name>
</noRegion>
<name sortKey="Chew Lim Tan" sort="Chew Lim Tan" uniqKey="Chew Lim Tan" last="Chew Lim Tan">CHEW LIM TAN</name>
<name sortKey="Linlin Li" sort="Linlin Li" uniqKey="Linlin Li" last="Linlin Li">LINLIN LI</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D23 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D23 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:09-0012148 |texte= Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval }}
This area was generated with Dilib version V0.6.32. |